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THE EDITORS SAY: _ 


A Red Letter Day 


Front pages of the newspapers have been crowded with news of the 
mid-eastern crisis. Mid-July issues carried headlines and prominent articles 
dealing with every phase of the United States Marines’ landing in Lebanon 
to quotations from Nasser, Eisenhower, and Khrushchev. Scarcely noticed 
was an announcement by the Board of Regents of the University of Cali- 
fornia on July 19 that Stanford University and the University of California 
had agreed to pool their graduate school educational facilities. According 
to this announced program graduate students in each university are to be 
permitted to take courses in the other so that the Bay Area, in fact the 
West Coast, could become a center for research. 

This enlightened action by two distinguished institutions seems counter 
to the traditional attitude assumed by many colleges and universities, for 
many institutions permit only limited course work taken elsewhere to be 
applied toward advanced degrees at their own schools. For example, it is 
not an uncommon policy to set a limit of two courses on the master’s degree 
and four at the doctoral level to be transferred from another school. In 
addition, a research project applicable to an advanced degree must be 
supervised or directed by a full-time staff member of the higher academic 
ranks. 

These policies have created issues for graduate schools. The demand 
for advanced work has spiraled, and securing competent teachers to super- 
vise intelligent, thoughtful research has become a critical problem. College 
budgets just can not be expanded sufficiently to attract the kind of staff 
members who have the specialized knowledges in every branch of learning. 
Consider, for instance, the area of psychology. A person can no longer 
have a thorough background in every phase, but he must specialize in one 
aspect of the subject. In order to satisfy the desires of its students an insti- 
tution offering a diversified program for psychology majors must employ 
dozens of capable specialists. The result is that a staggering sum of money 
for salaries is required yearly. The alternatives are either less diversified 
programs for the students or, even worse, less competent instructors. 

The officials of Stanford and the University of California have taken a 
step which could alleviate this condition. For this action they deserve com- 
mendation. Let us hope, however, that the cooperation exists in actuality 
and not on paper. If these two great universities do pool their resources 
so that graduate students interested in research may receive the benefit the 
West Coast may be a great research center. 

July 19, then could be a red letter day for education.—J. H. B. 
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Teacher Effectiveness Research: 
Problems and Status 


Davip G. RYANS 


The problems involved in the study of teacher effectiveness range from 
the strictly philosophical, concerned with choice of educational values, to 
the technical and quantitative, dealing with the reliability and validity of 
measurement and with research design. The present entry is restricted, as 
noted in the statement of purpose and basic issues which follow. 


Purpose 


Our purpose is to (1) comment briefly on some of the methodological 
aspects of teacher effectiveness research and (2) tentatively suggest some 
generalizations about the predictability of teacher effectiveness. 

The literature covering investigations of the relationship between 
hypothesized predictors and teaching competence is extensive, but it deals 
to a deplorable degree with researches which suffer particularly from (1) 
inadequate consideration of control and (2) lack of replication, and which 
consequently yield ambiguous and difficult-to-interpret results. Further- 
more, many researchers have failed to specify sufficiently the conditions 
under which their studies were conducted (to enable comparison of their 
findings with those of other researchers). Consequently, a considerable 
portion of this discussion will deal with generalizations regarding method- 
ology, many of which are suggested by deficiencies observed in reported 
research, 


Basic Issues 


The basic concern in the study of the prediction of teacher effectiveness 
is that of determining how and to what extent various data describing 
teachers (e.g., teacher behaviors and/or conditions affecting teacher behav- 
iors, both of which often are referred to simply as teacher characteristics) 
are either (1) antecedents or (2) concomitants of some designated criterion 
of teaching effectiveness. 


Dr. David G. Ryans is Professor of Education at the University of California 
at Los Angeles. Dr. Ryans received his undergraduate training at De Pauw Uni- 
versity and his graduate work at the University of Minnesota. His writings include 
studies on teaching standards, teacher selection, and personality and interests of 
teachers. In addition to this he has done work in various aspects of psychology 
such as achievement test construction, learning, intelligence, motivation, and per- 
sistence. Dr. Ryans is a member of the editorial board of the California Journal 
of Educational Research. 
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The degree to which such relationships (either antecedent-consequent 
or concomitant) between teacher characteristics and teaching effectiveness 
can be determined depends not only on the real or latent relationship exist- 
ing, but also upon (1) how clearly and operationally (in terms of observable 
behavior) the agreed-upon criterion can be defined, and how validly and 
reliably that criterion can be estimated, (2) how clearly and operationally 
the teacher characteristics under study can be defined, and how reliably 
and validly these can be estimated, and (3) adequacy of the sampling con- 
trol, and replication incorporated in the research design. 

The matter of definition, including the researcher’s need for operational, 
as well as connative, definitions, is one of first importance in the investiga- 
tion of teacher effectiveness [26, 30], but it is a general concern of research 
and will not be discussed here. (References 18 and 23 deal, in part, with 
criterion definition.) Reliability and validity also are general concerns; 
they are the sine qua non whenever measurement is undertaken, and their 
treatment in standard sources [7, 27] is germane. Similarly, many concerns 
of research design and analysis (sampling, control, statistical inference, 
etc.) are involved in the study of teacher effectiveness, but these also apply 
generally to many areas of investigation, and should be reviewed elsewhere 
[10, 11, 32]. 

Our comments on methodology, therefore, will be limited to matters 
which pertain specifically to research on teacher effectiveness, and will 
relate to (1) obtaining criterion data, (2) obtaining data describing teachers 
(data which may be hypothesized to predict some criterion), and (3) 
various approaches to the investigation of “teacher characteristics”-“criterion 
of teacher effectiveness” relationships. 


Obtaining Estimates of the Criterion 


Some characteristics of “the criterion” of teacher effectiveness, and the 
necessary assumptions, pholosophical backgrounds and relevance of various 
criteria, have been discussed in the literature during recent years [1, 2, 13, 
14, 16, 18, 23, 24]. Here we will be concerned solely with noting various 
approaches to the measurement of dimensions, or aspects, of the criterion 
of teacher effectiveness. These are summarized in the outline which follows. 


Methods of Obtaining Criterion Data 
A. Direct measurement of on-going teacher behavior. 


1. Time sampling involving replicated systematic observation and 
immediate assessment (analytical) by trained observers. 


2. Non-systematized observation and immediate assessment (ana- 
lytical or global) by non-trained observers. 
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B. Indirect measurement based on preserved records of on-going teacher 
behavior (e.g., tape recordings or motion pictures of teacher behavior in 
process ). 


1. Assessment (analytical or global) by trained observers. 


2. Assessment (analytical or global) by non-trained observers. 


C. Indirect measurement by non-trained observers, based on recall of 
teacher behavior and assessment (analytical or global) thereof. 


1. Rating by students or pupils of teacher. 
2. Rating by peers or colleagues of teachers. 
3. Rating by administrative personnel. 

4. Teacher self-rating. 


D. Measurement of a product (student behavior) of teacher behavior. 


1. Direct observation and assessment of product (on-going student 
behavior involving participation in class activities, acceptance of respon- 
sibilities, understanding of principles studied, learning of skills, etc.), 
or of a preserved record thereof, simultaneous with exposure to the 
hypothesized producer (teacher behavior). 

a. Time sampling involving replicated systematic observation 
and immediate assessment by trained observers. 

b. Non-systematized observation and assessment by non-trained 
observers. 


2. Measurement of a product (in this case, a product of student 
behavior which, in turn, is assumed to be a product of teacher behavior) 
immediately following exposure to the hypothesized producer (teacher 
behavior). 

a. Testing of knowledges, skills, understandings, etc. 
b. Inventorying of attitudes, interests, preferences, etc. 


3. Delayed measurement of product (a product of student behavior 
assumed, in turn, to be a product of teacher behavior) with time inter- 
vals intervening between exposure to the hypothesized producer (teacher 
behavior) and evaluation of the product. 

a. Measurement of manifestations of skills, understandings, atti- 
tudes, participations, etc., during succeeding phases of the student's 
education. 

b. Measurement of manifestations of “success” in occupational 
and civic affairs during later life of student. 


4. Measurement of product (product of student behavior assumed, 
in turn, to be product of teacher behavior) based upon “change” in 
student behavior and derived from measurement of student products 
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both before and after exposure to the hypothesized producer (teacher 
behavior). 
a. Pre- and post-testing of knowledges, skills, understandings, 
etc., and determination of gains or losses. 
b. Pre- and post-inventorying of attitudes, interests, preferences, 
etc., and determination of changes. 


E. Measurement of concomitants (secondary criterion data) of the cri- 
terion of teacher effectiveness. 


1. Concomitants known to be reliably related to the criterion (e.g., 
biographical data, inventory scores, etc., when demonstrated to be em- 
pirically correlated with some accepted criterion). 


2. Concomitants assumed to be related to the criteria (e.g., courses 
required for teaching credentials, appearance as revealed by photo- 
graphs, test and inventory scores relating to presumably “desirable 
knowledge and traits,” etc., believed, but not demonstrated, to be asso- 
ciated with an accepted criterion. 


These several approaches to measurement vary (1) in nature of rationale 
mustered for their support, (2) in reliability of the criterion data produced, 
and (3) in obtained relationship between criterion estimates (thus differ- 
ently derived) and specified predictors (e.g., a given predictor may show 
a different order of relationsltip with criterion estimates obtained through 
one approach than with estimates derived from another criterion measure- 
ment technique). It is important, therefore, that a researcher in this area 
become acquainted with the shortcomings and problems associated with 
each approach to criterion measurement [18]. 

The general approaches to the measurement of a criterion of teacher 
effectiveness involve (as noted) the evaluation of either (1) on-going teacher 
behavior in process, (2) a product of teacher behavior, or (3) concomitants 
of teacher behavior. Measurement of teacher behavior in process is the 
most direct approach; measurement of products and of concomitants are 
more indirect and more subject to the effects of confounding conditions. 

Generally speaking, concomitants (secondary criterion data) are not to 
be employed for criterion measurement if direct measurement of on-going 
teacher behavior or measurement of products of teacher behavior can be 
used. However, in investigations involving very large samples (where other 
measurement approaches may be impractical) the use of known concomi- 
tants as substitutes for process or product data can be defeneded. 

Of the measurement approaches employing observation and assessment 
of on-going teacher behavior only time sampling with replicated systematic 
observations by trained observers results in sufficiently reliable data for its 
use in basic research. Less well-controlled variations may be used, however, 
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when only coarse discrimination (e.g., “best” and “poorest” teachers with 
respect to some criterion) is required, and when the increased expected 
error is recognized. Various assessment techniques have been employed, 
among which the more reliable seem to be (1) graphic scales with 
operationally-defined terminal points and/or units [14, 19, 23], (2) obser- 
vatiion check-lists [15], and (3) forced-choice devices [9]. The principal 
shortcoming of observation and assessment techniques has been poor relia- 
bility, a difficulty which recent research has indicated can be overcome 
by giving care to definition and to scale development, and with adequate 
training by the observers or judges. 


Product measurements (estimates of the behavior or achievement of 
teachers’ pupils) have been widely acclaimed as particularly appropriate 
criterion data, but have been used only occasionally in the study of teacher 
effectiveness. The most defensible of the product measurement approaches 
are (1) the direct observation and assessment of pupil behavior during 
exposure to a teacher who is hypothesized to be at least a partial producer 
of the pupil behavior [23] and (2) measurement of pupil change from 
before to after exposure to the hypothesized teacher-producer [13]. 


Actually, the relevance and appropriateness of the measurement of pupil 
behaviors and their products as reflectors of teacher performance may be 
less real than they at first seem, for the producers of (or contributors to) 
pupil behavior or pupil achievement are numerous, and it is difficult to 
determine and partial out the contribution made by specified aspect of the 
producing situation (such as the teacher) to a designated product (pupil 
behavior). The facets of the product-criterion (various attitudes, skills, and 
understandings in various areas) are similarly numerous, and each one 
studied must be capable of valid measurement and must be isolable for 
study. Comparability of estimates of various components of the product 
also become a special problem when pupil behavior or achievement is 
employed as the criterion. When evaluation of the product is accomplished 
by obtaining estimates of pupil change, the matter of variable potential 
gain (pupils who score high on the initial measurement more closely approx- 
imating their “ceilings” than pupils who originally score low do theirs) is 
particularly plaguing. However, if the rationale of the product (pupil per- 
formance) criterion is accepted, and if the complex control problem (involv- 
ing a multiplicity of producers) and the multidimensionality of the criterion 
can be adequately coped with, pupil change becomes a challenging 
approach to teacher effectiveness. Mitzel and Gross [13] have commented 
critically on the development and use of the pupil change criterion. The 
review of Morsh and Wilder [14] covers the literature in the area of teacher 
effectiveness to 1952 and presents the case both for and against the employ- 
ment of measurements of pupil change (and of observation and assessment, 
as well). 
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In dealing with any of the several approaches to the measurement of 
the criterion, the researcher should be thoroughly familiar with, and guard 
against, the sources of criterion measurement bias pointed out by Brogden 
and Taylor [5], Ryans [18, 23] and others. 


Obtaining Estimates of the Predictors 


Numerous predictors of teacher effectiveness have been proposed (often 
with little apparent consideration of rationale) and efforts to determine the 
relationship between many of these and data assumed to provide estimates 
of the criterion frequently have been reported in the literature. Most of 
the researches have suffered from poor design and analysis, but clues 
derived from various studies do suggest teacher traits and conditions which 
may be hypothesized to be related to certain dimensions of teacher 
effectiveness. 

The chief technical problems faced in obtaining estimates of predictors 
are those well known to educational research and measurement workers, 
namely validity and reliability [7, 27]. Obviously, the prediction of any 
criterion is limited by the reliability and validity of measurements of hypoth- 
esized predictors. 

Among the predictor characteristics which have been investigated, and 
for which measurement has been attempted, are: Scores on tests of verbal 
and other cognitive abilities; Scores on tests of knowledge and understand- 
ing of general and special subject matter; Scores on tests of professional 
information; Course marks representing academic achievement; Course 
marks or ratings representing performance in practice teaching; Scores 
derived from inventories and/or projective devices developed to measure 
various “personality” traits, and emotional and social adjustment; Scores on 
attitude scales and inventories developed to measure teacher-student rela- 

_ tionships; Age; Experience; Speech and voice characteristics; and others. 

It is very important to note at this point that similarly named predictor 
measures employed in different reported studies are not necessarily measur- 
ing the same underlying characteristic of the teacher and, quite apart from 
sampling errors, they do not necessarily yield similar correlations with what 
purport to be similar criterion dimensions. Discrepancies in conclusions 
reported in the literature often may be traced to this lack of agreement 
with respect to operational definition of the predictor (in addition to cri- 
terion inadequacies and lack of control of relevant variables). 


Approaches to the Predictor-Criterion Relationship 


Still another set of conditions which lead to variability in the degree of 
association that may be found between hypothesized predictor measures 
and measures of a teacher effectiveness criterion has 0 lo xitk ‘he eppreach 
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to the predictor-criterion relationship involved in the research design. Ques- 
tions such as those which follow must be considered. 


A. Does the investigation purpose to determine concomitant or antece- 
dent-consequent relationships? 


Predictors may be related in two ways to the criterion they are hypothe- 
sized to predict: (1) there may be an antecedent-consequent relationship, 
where the predictor is a producer or causal factor with regard to the cri- 
terion; and (2) there may exist a concomitant or correlational relationship, 
where it can be observed only that the predictor varies as the criterion 
varies. Study of the first kind of relationship implies experimental procedure 
with introduction of an hypothesized predictor (following some original 
measurement) and judgment of its influence on the criterion. Study of the 
second implies the estimation of extent to which either (a) presence- 
absence or (b) measured amounts of a hypothesized predictor may be 
correlated with manifestations of the criterion. There has been little attempt 
to seek antecedent-consequent relationships. Most investigations have been 
at an exploratory level, and have attempted merely to find correlates or 
concomitants of a criterion of teacher effectiveness. 


B. Is prediction of the criterion of teacher effectiveness to be attempted 
from single bits of information (e.g., answers to single questionnaire, test, 
or inventory items) or from “scores” based upon combinations of such bits 
of information into composite sets of homogeneous items or scales? And 
if the latter, is the combination to be accomplished with equal or differen- 
tial weighting of the component bits? 

The value of replication is familiar to measurement and research workers 
and, other things equal, predictors comprised of multiple bits of information 
will be more stable and will make for more reliable prediction than single 
isolated items. An extension of this question is concerned with whether 
prediction of the criterion is to be attempted from a single composite pre- 
dictor alone (a “score” from one instrument alone) or for a combination of 
predictor scores (scores on several instruments) weighted perhaps in light 
of multiple regression weights. If the “scores” to be combined are not 
highly intercorrelated there will be an advantage in using estimates derived 
from several different predictors. 


C. Is the derivation of predictors (original selection of items, or combi- 
nations of items) based upon experience with a single sample, or has 
replication been employed involving multiple samples of teachers? 

The latter inevitably results in regression, but it has the advantage of 
allowing greater confidence in the predictors which survive the several 
samples. 


D. Is prediction concerned with (1) additional random samples of the 
same population as that from which was drawn the samples employed in 
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deriving the predictors (e.g., cross-validation or hold-out sample validation 
involving a defined population, such as “male secondary teachers”), or (2) 
samples of population other than that from which the predictors were origi- 
nally derived, either (a) employing the same criterion measure (validity 
generalization, e.g., estimation of the relationship between a predictor and 
a criterion consisting of assessments of elementary school teacher effective- 
ness provided by trained observers, when the predictors originally had been 
derived with reference to the same criterion, but for secondary teachers), 
or (b) a different criterion measure (validity extension, e.g., determination 
of the relationship between the predictor and pupils’ ratings of teachers in 
school system X, when predictors originally had been derived with reference 
to a criterion provided by assessments made by trained observers of teachers 
of school system Y)? 


E. Is prediction attempted for predictor and criterion data which have 
been collected at about the same time, or with the predictor data obtained 
first, obtaining of criterion data being carried out after some interval of time? 

Other things equal, concurrent obtaining of predictor and criterion data 
(or prediction when the intervening time is relatively short) is likely to be 
possible to a greater degree than when the time interval is a substantial one 
and when prediction is attempted for a criterion to be obtained in the future. 


F. Is prediction to be attempted with the predictor data obtained under 
“incentive” conditions (e.g., in connection with selection for employment) 
or under “non-incentive” conditions (e.g., in connection with research, 
involving anonymity of the participants) ? 

The extent to which prediction of the criterion may be altered by the 
influences of incentive conditions will vary depending upon a number of 
conditions, including degree to which the predictors are disguised, knowl- 
edge on the part of the teachers of objectives and values represented in the 
criterion measure employed, and others. 


G. Is prediction attempted for separate criterion dimensions, singly 
(e.g., effective classroom discipline), or for a composite contributed to by 
a number of heterogeneous components or dimensions (e.g., over-all teach- 
ing effectiveness )? 

Other things equal, the prediction of separate well-defined dimensions 
for which reliable measurement procedures exist or can be developed is 
likely to be possible to a greater extent than the prediction of the more 
ambiguous “over-all” teacher effectiveness. 


H. Is prediction of teacher effectiveness sought on an actuarial, or 
group, basis or for particular individuals? 

As a group, teachers of the age range of twenty-five to thirty may mani- 
fest significant superiority on a designated criterion of teacher effectiveness 
compared with a group of teachers between fifty-five and sixty years of age, 
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and such information may have important implications for the administra- 
tive-supervisory staff of a school system. Yet, an individual teacher in the 
latter age range may very well be superior to the average of the younger 
group, or an individual younger teacher may be well below the average of 
the older group, illustrating the substantially greater uncontrolled variability 
involved in individual as compared with group prediction. 

Still other approaches to the prediction problem might be added, but 
the foregoing are representative of some of the principal considerations 
involved in the design of research into the predictor-criterion relationship. 


Some Tentative Generalizations 


The understanding of teacher characteristics and of teacher effective- 
ness and its prediction has proceeded slowly. Relatively little attention has 
been accorded systematic theory of teacher behavior and many of the 
investigations reported seem to have been undertaken with blunderbuss 
motivation and little attempt to relate them either to other research or to 
a theoretical background. The paucity of dependable knowledge of “con- 
tributors” to teacher effectiveness (conditions influencing teaching compe- 
tence) undoubtedly is related to the fact that little attention has been 
devoted to theory development, thus restricting the generation of hypotheses 
[2, 25]. 

It should be noted also, that most of what is known about teacher char- 
acteristics and their relation to teaching has been derived from correlation 
research, relatively less being understood of the dependence of teacher 
behavior on specified antecedents, or of functionally dependent relationships 
between predictors and the criterion. 

Nevertheless, some generalizations are suggested by similarities in results 
reported by various researches [3, 4, 6, 8, 12, 14, 17, 20, 21, 22, 23, 28, 
29, and 32]. Such tentative conclusions as those which follow appear to be 
in order. 


1. The predictability of teacher effectiveness probably is affected by 
the multi-dimensionality of the criterion. There is evidence that predictions 
can be made with better than chance results for specified dimensions of 
the criterion. The prediction of over-all teacher effectiveness poses numer- 
ous problems and certainly is limited by the relativity of concepts of “effec- 
tiveness” and the resulting lack of operational definitions. Prediction is 
possible only to the extent that some agreement can be reached (among 
educators, and also the public) regarding the dimensions making-up “over- 
all effectiveness” (involving, of course, acceptance of a common set of 
educational values) and how they should be combined to form a composite. 


2. The predictability of teacher competence varies with the degree of 
control it is possible to impose in dealing with the multiplicity of predictors 
and the multidimensionality of the criterion. 
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3. The predictability of the criterion varies with the kind of measure 
employed in obtaining the criterion data. Different approaches to the esti- 
mation of the criterion behavior (e.g., supervisors’ ratings vs. pupil ratings 
of the teacher behavior) result in criterion data which are not similarly 
predictable from a given predictor. 


4. The predictability of the criterion differs with the adequacy (relia- 
bility and validity) of measures of (a) the criterion and (b) the predictor 
variables, which are employed. 


5. Predictability of the criterion is limited to such a degree by condi- 
tions associated with measurement of the criterion, measurement of pre- 
dictors and practical conditions limiting control in the research, that 
relationships representing common variance of perhaps one-fifth or one- 
fourth of the total variance (correlations of .45 or perhaps .50) probably 
approach the maximum to be expected except in chance instances. 


6. The predictability of a dimension of teacher effectiveness from a 
specified predictor very probably varies depending upon the cultural milieu 
which provides the setting for a particular investigation, especially such 
aspects of the culture as values and objectives prominent in the teacher 
training curriculum at the time the teachers studied were in college. 


7. The predictability of the criterion varies directly with the degree of 
similarity between (a) the sample with respect to which predictors are 
derived and (b) the sample to which the predictors are applied in determin- 
ing predictor-criterion relationships. 


8. The predictability of a criterion dimension differs with the particular 
teacher population (e.g., Grade 1-2 women teachers, men science teachers, 
etc.) studied. 

9. The predictability of a criterion varies inversely with the time interval 
separating the obtaining of (a) predictor measurements and (b) criterion 
measurements. 

10. The predictability of a criterion probably differs depending upon 
the association of either (a) incentive or (b) non-incentive conditions with 
the obtaining of predictor data. 

11. The repression of predictor measurements on criterion measure- 
ments frequently is curvilinear (e.g., positive correlation between amount 
of teaching experience and specified criterion measures of the effectiveness 
of secondary school teachers during first five years or so, followed by leveling 
off and decline in criterion measurements with extensive experience). 


12. Prediction of teacher effectiveness must be viewed largely in the 
actuarial sense. Successful prediction for groups of teachers is well within 
the realm of possibility. Individual prediction, however (as generally is the 
case in attempting to predict human behavior), is much more limited and 
is accomplished with a lesser degree of confidence. 
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Tentative Guidelines for 
Proper and Improper Practices 
With Standardized Achievement Tests 


ANTON THOMPSON 


School systems have had years of successful experience in the use of 
standardized achievement tests, that is, tests of arithmetic, reading, and 
the like. In many of these systems practices relating to testing have 
been sound. Administrators, counselors, and teachers have recognized stand- 
ardized tests for what they are—specialized instruments of measurement 
which must be given under standard conditions in order to obtain scores 
that are reliable and valid. There has been general recognition that money 
spent for standardized tests would be money thrown away if the tests were 
improprly used. 

Even in those school systems that have generally excellent records with 
respect to the use of standardized achievement tests, questions are raised 
from time to time (sometimes by those who have but recently joined a staff 
and sometimes by veteran employees) that suggest a need for guidance 
concerning the propriety, ethics, correctness, or appropriateness of certain 
specific administrative and instructional practices. Such questions are likely 
to be phrased along these lines: “Do you think it would be okay if... ?” 
“Would there be anything wrong in our .. . ?” “Since our regular text- 
book doesn’t cover xxx and there are some questions on xxx in the standard- 
ized test we use, couldn’t I... ?” “I have one boy in my class who is just 
hopeless, and he wouldn't be able to do a thing on the standardized test, so 
I was just wondering if it wouldn’t be all right if... ?” “Couldn't we raise 
our scores a lot by just... ?” “Is there any sense in being ‘persnickity’ 
about standardized tests when we know that in other parts of the country 
they ...?” This last question may also be used to refer to “other schools” 


Anton Thompson, Director of Research of the Long Beach Unified School 
District for the past thirteen years, received his Doctor of Philosophy degree from 
the University of Minnesota. Dr. Thompson has had a variety of experiences in 
education as he was an elementary school teacher in South Dakota, a high school 
teacher, principal, and superintendent in Minnesota, and an Associate Professor at 
Arizona State College at Flagstaff and the University of Minnesota. 
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rather than to “other parts of the country,” and it can also relate to “all 
the old-timers around here” who supposedly do certain things. 

Questions that relate to the propriety of various actions believed to 
have an effect on standardized achievement tests usually fall in one of three 
groups: (a) questions that are easy to answer because the authorities in the 
area of testing are agreed in their approval or disapproval of the questioned 
practice; (b) questions that are slightly difficult to answer because, even 
though they are clearly related to the usual do’s and don'ts treated in test- 
ing manuals, they are not specifically covered in most guides; and (c) 
questions difficult (if not impossible) to answer with confidence, because 
they relate to practices that are in a “twilight zone” between proper and 
improper procedures or because they may be either proper or improper, 
depending upon certain other circumstances. 

It is the purpose of this paper to attempt to specify answers to certain 
of the troublesome questions that pertain to proper and improper practices 
relating to the use of standardized tests. Some of these answers are based 
on the writings of such accepted authorities in the area of measurement as 
Lindquist and Traxler; other answers have been formulated on the basis of 
personal judgment. 

At the outset, we wish to emphasize the use of the word tentative in the 
title of this article. Possibly at a future time some competent group with 
wide responsibilities in the field of education will develop an improved and 
more authoritative statement on the subject of the common ethical problems 
met in the large-scale use of standardized achievement tests. Until then, 
however, school districts may need to consider the advisability of accepting, 
modifying, or rejecting such guidelines as are proposed below. 


Some Acceptable and Proper Practices 


1. It is qggt acceptable and proper to inform pupils some days in advance 
that they are to take a standardized test. Whether this practice is desirable 
in the lower elementary grades is a debatable question. However, the 
answer does not hinge on any matter of ethics. 


2. It is acceptable and proper to explain to pupils, well before the day 
a standardized achievement test is given, something about the purposes and 
general form of the test. Such a discussion would not include any refer- 
ences to the specific content of the test. It might properly cover such 
matters as the various types of objective questions (multiple-choice, com- 
pletion, etc.), the fact that some items will be easy and some will be very 
difficult, the fact that pupils shouldn’t worry if they can’t answer every 
question, the importance of working rapidly without racing, the advisability 
of making “intelligent” guesses and of avoiding “wild” guesses. (If this 
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statement is not sufficiently detailed, we suggest that you write for a copy 
of “Test Information for Students,” a one-page leaflet that can be obtained 
for 1¢ from the Educational Records Bureau, 21 Audubon Avenue, New 
York 32, N. Y.) 


3. It is acceptable and proper to familiarize pupils with the mechanics 
of a standardized achievement test before the actual testing begins, through 
the use of practice exercises. However, it is important to keep in mind that 
the purpose of such exercises is to familiarize the pupil with various types 
of objective questions and/or to teach pupils how to mark their answers to 
the questions—either on the booklets or on separate answer sheets. It is 
not the purpose of these practice exercises to offer any clues as to the con- 
tent of an approaching test. If commercially published practice-exercises 
are not used, the counselor or teacher who constructs such exercises should 
use no item that duplicates or nearly duplicates an actual question from 
a standardized test. (One teaching cue: to get the pupil to focus his atten- 
tion on the mechanics of the sample questions and how to mark the an- 
swers, ask very easy questions as most commercial test-makers do whenever 
they include practice items at the beginning of their tests. ) 


4. It is acceptable and proper for a principal or counselor to provide 
teachers with general information concerning achievement tests that have 
been scheduled for use in a given grade or department. 


5. It is acceptable and proper for a teacher whose course of study rec- 
ommends the development of a given learning to continue to teach the 
recommended material even though the teacher knows that Test — con- 
tains a specific question relating to that particular skill, information, or 
understanding. (For example, in a course concerned with an understanding 
of the content of the amendments to the U. S. Constitution it is proper to 
continue to teach about the 16th Amendment.) Knowledge of the general 
content of an approaching test does not require the teacher to go out of 
his way to avoid any reference to the learnings in the test. The real test of 
propriety is whether the teacher omits important learnings because they are 
not in a given test, whether he includes materials not in the course of the 
study because they cover items that are in a given test, and also whether 
he consciously gives undue attention to particular learnings in the course of 
study because he knows they are in a given test. 


6. It is acceptable and proper to try to bring about optimum motiva- 
tion of the pupils for the taking of a standardized achievement test. This 
means trying to get pupils to do as well as they can—to be “cooperative” 
test-takers. It does not mean that pupils are to be threatened or made 
extremely tense and anxious. 
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7. It is acceptable and proper in giving a machine-scored test to pass 
around a sample well-marked answer sheet before the test is begun, and 
to point out the possibility that the pupils will lower their test scores if they 
don’t try to mark their answer sheets like the sample. 


8. It is acceptable and proper for the person who gives standardized 
tests to guard the material well before, during, and following the actual 
giving of a test. (Suggestion on procedure: At the secondary level where 
machine-scored tests are commonly used, it is advisable in the larger schools 
to use a serial number on the test booklets. If a test examiner is given 
Booklets No. 41 to 80 at the beginning of the day, he should be able to 
account for Booklets No. 41 to 80 at the end of the day.) 


9. It is acceptable and proper to combine two or three class sections in 
one large testing group. (Suggestion on procedure: To make large-group 
testing operate smoothly, (a) pupils should be sufficiently mature; (b) the 
examiner should be able to direct a large group; (c) a satisfactory testing 
room should be available; and (d) classroom teachers of the groups being 
tested should aid the examiner by serving as proctors.) This statement 
doesn’t mean that we consider testing in large groups superior to testing 
in small groups. Research is needed as to the “best” size. However, no 
question of propriety or ethics is involved. 


10A. It is acceptable and proper to make instructional use of the results 
of an item-analysis (a study of the difficulty of each item for a group of 
pupils) in remedial work immediately following a survey. 


10B. It is acceptable and proper to make instructional use of the results 
of an item-analysis from a previous year’s testing in locating types of learn- 
ings covered in the local course of study which appear to have been taught 
ineffectively. This statement contains several intentional limitations. (a) It 
is the type of learning—the multiplying of a two-place number by another 
two-place number or the agreement of subject and verb or the use of the 
colon—with which the teacher is properly concerned. The particular test 
item that may happen to involve the multiplying of 46 by 23 or the place- 
ment of a colon after “Dear Sir” is not the important thing. (b) The 
teacher’s proper concern is with those learnings that are covered in the 
local course of study. Illustration: If a test item asks for the capital of 
Chile, if an item-analysis should show that most local children don’t know 
the correct answer, and if the local course of study doesn’t provide for any 
study of Chile, it is not the proper job of the teacher to “bootleg” a discus- 
sion of Chile with his class just prior to their taking the test. 


11. It is acceptable and proper for a local curriculum committee to 
study the results of an item-analysis as one kind of evidence that may be 
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used in determining whether a school system’s curriculum is sufficiently 
comprehensive and well-balanced. Suppose, as in No. 10B above, that an 
item-analysis should show that 98 per cent of a district’s seventh graders 
didn’t know the capital of Chile and suppose this called attention to the 
fact that in the first seven grades the local course of study required no 
study of Chile. It would be proper for the curriculum committee to con- 
sider whether or not there should be any revision of the course of study— 
and the fundamental reason for making the revision, if that happened to 
be the group’s decision, would not be whether this would help the dis- 
trict’s children answer Item 37 of Text X. 


12. It is acceptable and proper for a teacher to return a pupil’s scored 
achievement test booklet or answer sheet to the pupil for use in class dis- 
cussion or in individual conference. The booklets or answer sheets should 
not be taken from the classroom, since they may then fall into the hands 
of others who will be taking the same test. The usual purposes of returning 
a scored booklet or answer sheet are (a) to inform the pupils concerning 
their individual results on the test, and (b) to call the attention of the 
class to any group weaknesses in learning revealed by the teacher’s study 
of the frequency of errors on each test item—weaknesses that may suggest 
a need for subsquent re-teaching and re-learning. 


Some Unacceptable and Improper Practices 


1. It is not acceptable and proper for a teacher to coach pupils on the 
subject matter of a specific standardized test. Such coaching will invali- 
date the scores earned by the pupils. Not only will such scores have no 
value, they may actually do the pupil harm if they are recorded and used 
for guidance or instructional purposes. Testing authorities are in complete 
agreement that teachers and pupils should take standardized tests “in stride.” 


2. It is not acceptable and proper to use a standardized achievement 
test as an abbreviated course of study. A good survey-type test is not a 
collection of items of the “minimum-essentials” type. Such a test is merely 
a sampling of many learnings, and the test-maker did not intend that 
teachers should have exposed all their pupils to all the particular learnings 
included. Example 1: A test given at the fifth grade level may contain a 
few items that deal with skills ordinarily learned in the eighth grade; the 
fifth grade teacher should not teach these skills “because they are covered 
in the — Test.” Example 2: A test given at the eleventh grade level may 
contain a few factual questions which can’t be answered on the basis of 
the approved local textbooks and supplementary materials; the eleventh 
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grade teacher should not conduct a lecture session in which he “covers” 
those facts. Example 3: A course of study for a given high school grade 
may recommend that one-third of the course be given to a study of Ameri- 
can literature. A teacher of the course, knowing that the standardized 
English test given at the end of the term contains no items on literature, 
should not skip the teaching of literature because his students won't be 
tested on that part of the total course. Teachers don’t need to worry about 
“covering” all the items in a good test since the pupils in the norm popula- 
tion were not exposed to all these learnings before they took the same test. 
Therefore, a student doesn’t need to be “dragged through” all the learnings 
to do as well on the test as his mental equal in the norm group. 


3. It is not acceptable and proper for a teacher to make a collection of 
various standardized tests for the purpose of lifting items for use in his 
“teacher-made” tests. In a few fields such as secondary social studies, 
teachers can purchase collections of test items—collections which have been 
prepared to help teachers in building their own tests. This practice is legiti- 
mate, of course. 


4. It is not acceptable and proper for a teacher to develop instruc- 
tional materials which include a few specific items from a standardized test 
that is to be given even though the materials also include other content 
not taken directly from the test. Once test items have been lifted out of 
a test and used as instructional material, the value of that test as a standard- 
ized measuring instrument has been destroyed. (The person who is over- 
weight does not solve his problem by tampering with the scales.) 


5. It is not acceptable and proper to use the report of a previous item- 
analysis as a “pony” for getting other pupils ready for that test. 


6. It is not acceptable and proper to use a particular standardized test 
as a model for the construction of an elaborate set of drill exercises that 
parallel the content and format of that test. This isn’t teaching the exact 
content of a test. This is simply “snuggling up” as close to the original as 
the teacher can without duplicating the original items. Some would say 
that this practice is just another variation of using a test as an abbreviated 
course of study. 


7. It is not acceptable and proper to give one form of a standardized 
test a few days or weeks before a second form of the test is to be given as 
a district-wide survey. 


8. It is not acceptable and proper to exclude a regular pupil from tak- 
ing a district-wide survey on the grounds that his teacher thinks the pupil 
is a poor learner or because the pupil is said to “get all fouled up” when 
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he takes a test. Presumably the norm group also included some children 
who were dull and some who hated to take tests. 


9. It is not acceptable and proper to exclude a whole section of the 
class from taking a district-wide survey on the grounds that they are mem- 
bers of a low-ability group. We would make one exception to this general 
rule. Pupils who are so mentally handicapped as to be considered a part 
of a special education program need not be included in surveys intended 
for the regular pupils of the district. 


10. It is not acceptable and proper to accept or promote a general atti- 
tude of “Don’t-be-fussy-about-methods-so-long-as-they-raise-the-test-scores.” 
Such an attitude sets high scores for their own sake as the goal of instruction. 


11A. It is not acceptable and proper to neglect in any way the instruc- 
tion of the weakest pupils and/or the strongest pupils in a group in order 
to concentrate more effort on raising the median score of that group. 


11B. It is not acceptable and proper to neglect in any way the instruc- 
tion of the less able and the average pupils in a group in order to try to have 
a small number of the more capable pupils earn exceptionally high scores 
on a survey test. 


12. It is not acceptable and proper to alter the time limits for a stand- 
ardized test. 


13. It is not acceptable and proper, after a survey test is under way, 
to give pupils any help beyond that allowed by the test manual. For most 
tests, this means that the tester can’t help a pupil who raises his hand to 
ask the meaning of a particular test question, etc. Although the testers are 
usually given some leeway in answering questions not dealing with the 
content of the test, it is a safe rule to err on the side of explaining too little 
rather than too much. 


14. It is not acceptable and proper for the administrator of a standard- 
ized group achievement test to give special assistance to the poor readers in 
a class by reading test items aloud. This holds true even though the test 
is an arithmetic test and a teacher is confident that certain children who 
could not otherwise do so could correctly solve a test exercise if it were just 
read aloud to them. If a test was normed without reading items out loud 
to the children in the standardization sample, then the test must be given 
the same way if pupils’ scores are to be interpreted through the use of 
the publisher’s norms. 


15. It is not acceptable and proper to make individual pupils over- 
anxious and tense concerning the outcome of a standardized achievement 
test, nor to set impossible goals for a class. 
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16. It is not acceptable and proper for a counselor or school adminis- 
trator to foster a spirit of personal rivalry among teachers or between schools 
concerning the scores earned by their pupils on standardized achievement 
tests. A major reason for avoiding the development of this spirit is that 
such rivalry may lead to many other wrong or improper practices having 
to do with standardized tests. 


17. It is not acceptable and proper for a school employee to be a 
spreader of rumors concerning improper testing practices allegedly taking 
place in other classrooms, or in other schools, or in other school systems. 
One might speculate at length on the reasons why such tales are told. It is 
probable that one important factor is simply the readiness of so many to 
listen with interest to any story that deals with alleged wrongdoing. 


Teacher Effectiveness . 
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Measuring Social Relationships 
In a Special-Grouping Program 


For Fast-Learners 
Mary GoL_pwortH 


The growing concern about education of gifted children has raised ques- 
tions as to the desirability and effectiveness of various types of educational 
programs. A frequently discussed issue revolves around special grouping, 
a practice which, it has been argued, promotes attitudes of intolerance and 
undermines children’s acceptance of one another. This study attempted to 
examine the effects on children’s social relationships of a fast-learner pro- 
gram involving part-time special grouping. This article discusses two aspects 
of this study. One deals with methods and problems concerning the meas- 
urement of social relationships. This concerns the selection of instruments, 
the ways in which the variables of the study were defined, and certain 
limitations with which one is confronted in using sociometric instruments. 
The other deals with the implications of the findings of this study, not only 
for current practice but for future research. 


Description of the Study 


The schoo] district in which the study took place is located in a San 
Francisco Bay Area suburb. It provided a program of special classes twice 
a week for periods of an hour-and-a-half in the four subject areas: art, 
biological science, physical science, and social studies. Fast-learners were 
selected from the fourth through eighth grade classrooms in all eight schools 
throughout the district on the basis of a score of 130 IQ and above on the 
California Test of Mental Maturity and of 120 IQ and above on the short 
form of the Stanford-Binet. The sixty-five classrooms which were found to 
contain fast-learners were randomly divided, by school and by grade level, 





Mary Goldworth is a School Psychologist for the Sunnyvale School District. 
Prior to her present position she was research assistant to Pauline Sears at Stan- 
ford University and was a school psychologist interne in the Ravenswood School 
District. Mrs. Goldworth received the Ed.D. degree from Stanford in 1957. This 
article is based upon her dissertation, “The Effects of a Fast-Learner Program on 
the Social Relationships of Elementary School Children” under the direction of 
Dr. Arthur P. Coladarci. A companion study, “Cchievement Factors of Fast- 
Learners in a Partially Segregated Elementary School Program with Special Refer- 
~~ - a Instruction” was completed by Richard Hinze (Ed.D. Stanford, 
August ; 


167 








CALIFORNIA JOURNAL OF EDUCATIONAL RESEARCH Vol. 1X, No. 4 


into two groups, one of which was designated as experimental and the other 
as control. These two groups were equivalent in size, IQ distribution, socio- 
metric status, and number of fast-learners. The fast-learners who were in 
the experimental group participated in the special program; the ones in the 
control group did not. The following classification of groups was used: 


1. 


9 


“~- 


3. 


Experimental fast-learners (FLx), who attended special classes. 
Control fast-learners (FLc), who did not attend special classes. 


Non-fast-learner experimentals (NFLx), who were not designated as 
fast-learners but attended clasrooms in which fast-learners were par- 
ticipating in the special program. 

Non-fast-learner controls (NFLc), who were not designated as fast- 
learners and who attended classrooms in which fast-learners did not 
participate in the special program. 

Experimental (X) classrooms, regular classrooms in which there were 
experimental fast-learners. 


Control (C) classrooms, regular classrooms in which there were con- 
trol fast-learners. 


Remembering that the study was conducted under conditions where 
fast-learners attended special classes for a part of the time, five problems 
stated in question form were investigated. 


to 


THE QUESTIONS 


To what extent do children in a regular classroom accept each 
other as friends? 


To what extent do children accept fast-learner classmates as 
friends. 


To what extent do fast-learners accept other classmates as friends? 


What is the effect of the program on the cohesion of the regular 
classroom group as a whole, as indicated by the proportion of 
mutual choices made? 


Do fast-learners tend to form a sub-group or clique within the 
regular classroom? That is, do they tend to choose each other 
more than they choose their other classmates? 


Pre-measures and post-measures were taken on all children in experi- 
mental and control classrooms. A comparison was then made between the 
two groups with respect to the degree of change which took place during 
the four month period that the fast-learner program was in effect. Two 
instruments administered in this study were selected from a great number 
of sociometric procedures which have been devised to measure the kinds of 
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social relationships that are of concern here. The first is generally referred 
to as the “sociometric test”, because it makes use of the basic sociometric 
technique which consists in asking each individual in a group to choose 
from among the members of his group those with whom he would prefer 
to associate for specific activities or in a particular situation.1 The second 
is the Columbia Classroom Social Distance Scale and is referred to as a 
near-sociometric technique because it resembles the sociometric test and is 
intended to measure similar phenomena.” 

The rationale upon which the choice of the sociometric instruments in 
the present study is based may become clear when one reconsiders the 
questions to be investigated. For ease in following the discussion these 
will be briefly recapitulated here. Question one deals with the degree of 
acceptance of all children, fast-learners and non-fast-learners, by their class- 
mates—within their regular classroom groups. This requires an estimate of 
the relation of every member of a group to every other member. Since the 
Columbia social distance scale was designed for just such a situation, it 
was selected as the appropriate instrument to be used here. The socio- 
metric test would have been less satisfactory since each individual would 
make only a limited number of choices of those with whom he would prefer 
to associate, thus permitting the possibility that some individuals would 
receive no choices. The degree of acceptance was determined for each 
child by the number of ratings he received in the “best friend” category 
divided by the total number of ratings he received in all five categories. 
Question two deals with the degree of acceptance of fast-learners by their 
classmates—within their regular classroom groups. This requires an esti- 
mate of the relation between every member of a group and each fast- 
learner in the group. The information needed is the same as for question 
one except that it is limited to fast-learners only; hence the same instrument 
is again appropriate. The degree of acceptance was determined in the same 
manner except that instead of using all children only fast-learners are con- 
sidered. Question three deals with the degree of acceptance of classmates 
in the regular classroom groups by fast-learners. This requires an estimate 
of the relation between the fast-learners in a group and every other member 
in the group. The information needed is, in effect, the reverse of that in 
the second question. Again, the same instrument and same procedure was 
used, although this time one would use the rating made by each fast-learner 





1The following questions were used: With what three children would you like 
to work best? With what three children would you like to play best? What three 
children would you most like to have sit near you? 

2The Columbia Classroom Social Distance Scale indicated a child’s social pref- 
erence for each of his classmates on a five-point scale, as follows: 
1. I would like to have him as one of my best friends. 
2. I would like to have him in my group but not as a close friend. 
3. I would like to be with him once in a while but not for a long time. 
4. I don’t mind his being in our room but I want nothing to do with him. 
5. I wish he were not in our room. 
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of all his classmates rather than the ratings made by all the classmates of 
each fast-learner. 

Question four is concerned with the degree of cohesion among all chil- 
dren, including fast-learners, by regular classroom group, based on a mutual 
choice index. Since methods of analysis for handling this kind of problem 
have been developed predominantly on the basis of sociometric test data, it 
was felt that the sociometric test would be most pertinent here. If data 
from the social distance scale were to be used instead, methods of analysis 
would have to be adapted to this instrument and would be a great deal 
more complicated because of the fact that there would be a differing num- 
ber of choices made by each individual for every “point” on the scale. 
Group cohesion was determined for each regular classroom by dividing the 
actual number of mutual choices by the total possible number of mutual 
choices. Question five is concerned with the degree of sub-group prefer- 
ence, i.e., fast-learner’s preference for fast- learners, within their regular 
classroom groups. The sociometric test is applicable here for the same 
reasons as for question four. Sub-group preference was determined for 
each regular classroom by dividing the number of choices made by fast- 
learners of other fast-learner classmates by the total possible number of 
such choices that could have been made by fast-learners. 


Findings 


1. At all grade levels the proportion of children showing an increase in 
the degree to which they were accepted as friends by their classmates was 
significantly greater for control children than for experimental children 
(P<.001). 

2. In grades 4-6, when experimental fast-learners were compared with 
control fast-learners, a greater proportion of controls showed an increase in 
the degree to which they were accepted as friends by their classmates, and 
on the other hand a greater proportion of experimentals showed a decrease. 
Although the difference between these two groups was not significant, it 
was large enough to warrant attention, especially since the change was in 
the same direction as that found in question one (.10>P>.05). At the 
seventh and eighth grade levels the difference between experimental and 
control fast-learners was not significant (.80>P.70). 

3. In regard to fast-learners’ acceptance of their classmates as friends, 
group cohesion in the regular classroom and sub-group preference in the 
regular classroom, there were no significant differences between comparable 
experimental and control groups. The probabilities obtained ranged from 
.20>P>.10 to .99>P>.98. 

It was concluded that, despite the occurrence of some negative changes, 
children’s relationships, as defined herein, remained stable to a consider- 
able degree. 
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Discussion 


Several precautions are necessary in interpreting these findings. First, 
with regard to educational program, a distinction must be made between 
that which is concerned with adminsitrative procedure, e.g., special group- 
ing, and actual learning experiences gained by the child. It is conceivable 
that two programs similar in procedure may qualitatively differ as to their 
educational value. Thus, although we may be justified in viewing special 
grouping as an acceptable educational practice, we would be in error to 
assume that its application automatically assures the desired educational 
outcome. 

Second, it should be clear that this study does not compare the relative 
merits of different types of provisions in the education of gifted children. 
Rather, it compares a particular type of special grouping with the kind of 
treatment ordinarily provided within a regular classroom setting. The 
results of this comparison imply that, in terms of the variable studied, the 
regular classroom environment is about equivalent in effect to that provided 
by the special grouping program. However, it is possible that special group- 
ing provides greater opportunity for individual growth in areas as yet unas- 
certained. 


Problems and Limitations 


Several problems commonly arise in employing the Sociometric Test as 
an instrument. First is the question of whether children should be asked 
to make their choices in general or specific terms. A general question might 
be, “Who is your best friend?” Whereas a specific question might ask, 
“With whom would you like to sit?” Since this study was concerned with 
children’s reactions to specific situations the latter technique was employed. 

The second problem is concerned with whether the choice is real or 
hypothetical. The choice is real if, after it is made, it is brought about in 
actuality; that is, if John chooses to sit next to Mary, he is subsequently 
given the opportunity to do so. The choice is hypothetical if it is not 
brought about in reality. In this study the use of the unreal or hypothetical 
criteria seemed justified since there appeared to be no research which might 
make them questionable. 

The number of criteria to be used is also a problem and has not been 
settled experimentally. Since it has been found that results vary with dif- 
ferent criteria,® it was decided that more than one should be used. Con- 
sequently, three criteria with three choices for each one were employed. 
A further consideration is that particular criteria should cover broad areas 


’Bronfenbrenner, p. 50. 
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of experience common to all children being studied and also represent 
varied aspects of group life. 

The use of any kinds of sociometric instruments present certain limita- 
tions particularly when employed with large numbers of subjects, as was 
the case here. That is, they give only very specific information about each 
individual subject—namely, his stated preferences for others in his group. 
They do not examine motives underlying these preferences, and they cen- 
tainly do not delve into dimensions of personality as might be possible with 
a case study approach. One would therefore be unjustified in using socio- 
metric data as a basis for generalizations regarding such broad concepts as 
mental health or socioemotional growth. 

Further, the use of any evaluation instrument involves a problem of 
underlying value assumptions. For example, on an achievement test it is 
assumed that a high score is more desirable than a low one. This is so 
because we can usually agree that academic learning is valuable or desir- 
able. Likewise, with a sociometric instrument one assumes that it is good 
to like others and to be liked by them. However, our social values are com- 
plex and sometimes conflicting. We may wish our children to be friendly 
and popular, but we also wish them to be creative, think independently, 
show initiative, and develop and maintain their individuality. Awareness 
of such value implications is basic to the interpretaton of a study such as 
the present one. 





Implications for Further Research 


The findngs of this investigation have briefly been examined as to their 
implications for current practice. Some of the many questions which remain 
to be answered by future research are indicated in the subsequent para- 
graphs. 

What of the relationships and effect within the special groups them- 
selves? It has been suggested that special grouping permits children to 
develop closer, more satisfying friendships with those who share their inter- 
ests and who are more nearly their intellectual equals. On the other hand, 
it has also been suggsted that a highly competitive atmosphere is threaten- 
ing or discouraging. Further evidence is needed to resolve these questions. 

How does special grouping affect the development of social responsibil- 
ity and leadership? Does it foster a concern for the problems of society 
and a willingness to assume responsibility? Is it an aid in developing leader- 
ship ability? Although there is reason to believe that. such outcomes may 
be obtained, further corroboration in a variety of special grouping settings 
seems necessary. 

What is the effect of specific special grouping practices on long-range 
personal-social adjustment? Are there any particular advantages regarding 
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the adjustment which chlidren make in their adolescence and adulthood in 
such areas as academic scholarship, marriage and parenthood, vocation, and 
citizenship? 

And finally, what is the effect on less gifted children not placed in spe- 
cial groups? The findings of one study* imply that these children feel 
deprived of the prestige which accompanies assignments to a special 
(“bright”) group. On the other hand, it is believed by some school author- 
ties that a part-time special grouping program will enhance the educa- 
tional environment not only for fast-learners but for children who do not 
attend special classes. This question seems to be largely neglected in the 
research literature and is one which could well be investigated. 


Conclusion 


Public education is concerned with providing for the needs of all chil- 
dren. This study has involved only a few problems within one segment 
of a total school program. Its emphasis has been on certain relationship 
effects which have frequently been classified as intangibles. It calls atten- 
tion to the need for clear awareness of educational objectives and continu- 
ous evaluation of outcomes. Only by such methods can intelligent adapta- 
tion to needs be made. 
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Are You Eligible for Membership in 
THE AMERICAN EDUCATIONAL RESEARCH ASSOCIATION 


Many administrators, supervisors, and guidance consultants, as well as research 
workers, are eligible for membership in the American Educational Research 
Association. 

Eligibility for election to active membership in AERA is based upon satis- 
factory evidence of: 

Active interest in educational research, 

Ability to contribute constructively to educational research, 
Desire to further the objectives of the association, 

Intention to participate further in educational research, 
Professional training of at least a master’s degree level. 


ei 


Those interested in becoming active members may apply directly to the 
Secretary-Treasurer, Dr. Frank W. Hubbard, Secretary-Treasurer, AERA, 1201 
Sixteenth Stret, Northwest, Washington 6, D. C., or they may have their names 
submitted by an active member. Applications are reviewed by a committee of the 
Association, and those approved are invited into membership by the Secretary- 
Treasurer. 


The dues of active members are ten dollars, payable before March 1 of each 
calendar year. Active members receive all the privilges of membership including 
the right to vote and to hold office. 


Included among the publications of the American Educational Research Asso- 
ciation are AERA Newsletter, which is distributed four times a year, and the 
Review of Educational Research which is published five times a year. The Review 
contains brief summaries of all research over a three-year period on such topics 
as educational psychology, educational measurement, administration, curriculum, 
guidance and counseling, mental and physical development, research methods, spe- 
cial education, teacher education, teacher personnel, educational and psychological 
testing. 


The National Convention of the American Educational Research Association 
is held at Atlantic City in conjunction with the Annual Convention of the American 
Association of School Administrators. The President of the American Educational 
Research Association is Dr. David H. Russell, Professor of Education, University 
of California at Berkeley. 





REFERENCES (Continued from page 173) 


9. Northway, Mary L. A Primer of Sociometry. Toronto: University of Toronto 
Press, 1952. 


10. Passow, A. Harry, and others. Planning for Talented Youth: Considerations 
for Public Schools. (Publication No. 1, Talented Youth Project, Horace Mann 
—Lincoln Institute of School Experimentation.) New York: Teachers College, 
Columbia University, 1955. 84 p. 


11. Proctor, Charles H., and Loomis, Charles P. “Analysis of Sociometrice Data.” 
In Maria Jahoda, Morton Deutch, and Stuart Cook (eds.), Research Methods 
in Social Relations with Especial Reference to Prejudice: Part II (separate 
vol.), Selected Techniques, pp. 561-585. New York: Dryden Press, 1953. 


12. Sumption, Merle R. Three Hundred Gifted Children. New York: World Book 
Company, 1941. 


174 


oe 


~ oer 


Tae 





College Achievement of High School 
Vocational Agriculture Students 


OrvVILLE E. THomMpson 


Controversy is unending about which kind of high school curriculum 
best prepares students for academic success in college. Concern has been 
expressed about the value of vocational agriculture for the high school 
student who has the ability to do college work. Since the vocational agri- 
culture curriculum might prevent such a student from enrolling in all of 
the courses in the usual college preparatory program, the question is: should 
a student aiming for an agricultural college take vocational agriculture or 
enroll in the college-directed curriculum? 

For years, leaders in agricultural education in California have empha- 
sized the value of agriculture in high school for those who plan to continue in 
this field in college. However, there is little local up-to-date evidence to 
support this claim. All of the many studies conducted in other states on 
the achievement of former vocational agriculture students in agricultural 
college have shown that these students were not penalized scholastically 
by having taken agriculture in high schools. Studies that compare college 
grades in agricultural subjects only show that former vocational agriculture 
students invariably made higher grades in these subjects. 


Problem 


A study was begun in the spring semester, 1958, at the University of 
California College of Agriculture at Davis, to compare the total academic 
achievement of students who had taken vocational agriculture in high 
school, as measured by total grade point average, with students who had 
not had agriculture in high school. 


Procedure 


It was found that about one-third of the men students enrolled in the 
College of Agriculture at Davis had completed one or more years of voca- 
tional agriculture in high school; more than half had completed a four- 
year course. Since in many schools it is possible for a student to take 
one or two years of agriculture as electives in the college preparatory pro- 
gram, only students who had completed three or more years of agriculture 


Orville E. Thompson is an Assistant Professor at the University of California 
at Davis, a position he has held for four years. He received his doctorate from 
Cornell University in 1954 and prior to coming to the University of California had 
taught at Cornell University and in secondary schools. 
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in high school were included in the study. In addition, it was decided 
that at least two semesters of college work were necessary to establish a 
fair measure of the student’s academic success. Students in this institution 
who had completed a portion of their collegiate work at another school 
were included, although only the grades earned at the University of Cali- 
fornia were considered. 

The investigators found that seventy-five students enrolled in agricul- 
ture at Davis met the above criteria. Of these, fifty-three had completed 
all their collegiate work at this institution. The remaining twenty-two had 
previous collegiate work at other institutions. In the group that had done 
all their work at Davis there were sixteen sophomores, twenty-four juniors, 
and thirteen seniors. One transfer student was a sophomore, four were 
juniors, and seventeen were seniors. 

A random sampling process was used to select groups for comparisons 
of college achievement. Separate randomly selected groups were drawn to 
match each of the categories of former vocational agriculture students. The 
only obvious difference between the two sets of students was that one had 
taken three or more years of vocational agriculture in high school. 

The t test of significance was used to establish if there were significant 
differences in academic achievement as measured by grade point averages, 
computed on a four point system, between students who had taken three 
or more years of vocational agriculture in high school and students who 
had not. 


Discussion 


In no instance was there found a significant difference in scholastic 
achievement between students who had taken three or more years of agri- 
culture in high school and those prepared in other curricula. 

Table I shows a comparison of grade point averages of the samples of 
non-transfer students by classes and by the three classes combined. It can 
be noted in the sophomore class that the difference in the mean grade 
point average was 0.01. The t value obtained was only 0.035. Since this 
is smaller than the tabled value of 2.13, at the 5 per cent level with 15 
degrees of freedom, it can be concluded that there is no significant differ- 
ence in academic achievement, as measured by grade point average, between 
these two groups of sophomores. 

A mean difference of 0.26 was found between the grade point averages 
of the two groups of juniors. This is considerably larger than that found 
with the sophomores, but is still not so large that it could not have been 
caused by chance. This t value of 2.04 is not significant at the 5 per cent 
level. . 
The difference of 0.05 in mean average for the groups of seniors was 
likewise not large enough to demonstrate a significant difference between 
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the two groups. The combination of the three classes of students showed 
a difference in mean grade point averages of only 0.12. A non-significant 
t value of 1.48 was obtained. 


TABLE | 


College Achievement of Non-Transfer Students Having Vocational Agri- 
culture in High School and Those Not Having Had Vocational Agriculture 





College Grade Point Averages 
Former High School 


Curriculum Number xX =X? t P05 
Sophomores 
Vocational Agriculture 16 39.528 2.47 101.732 
0.035 2.13 
No Vocational Agriculture 16 39.432 2.46 100.323 
Juniors 
Vocational Agriculture 24 56.419 2.35 134.494 
2.04 2.07 
No Vocational Agriculture 24 62.551 2.61 168.886 
Seniors 
Vocational Agriculture 13 31.731 2.44 78.673 
0.43 2.18 
No Vocational Agriculture 13 32.360 2.49 81.369 
Combined Groups 
Vocational Agriculture 53 —-:127.708 2.41 315.898 
1.48 2.02 
No Vocational Agriculture 53 = 134.343 2.53 350.579 





The comparison of grade point averages of the transfer students who 
had had vocational agriculture and those who had not had this curriculum 
showed a difference of 0.09 (Table II). These groups were combinations 


TABLE II 


College Achievement of Transfer Students Having Had Vocational Agri- 
culture in High School and Those Not Having Had Vocational Agriculture 


College Grade Point Averages 





Former High School 





Curriculum Number =X zz =X? t POS 
Vocational Agriculture 22 51.691 2.35 123.718 
6 2.08 
No Vocational Agriculture 22 = 49.795 2.26 115.128 


of sophomores, juniors, and seniors. The t value of 0.86 obtained is smaller 
than the tabled value of 2.08 needed for significance at the 5 per cent level 
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with 21 degrees of freedom. Here again, no significant difference in means 
was demonstrated. 

When non-transfer and transfer students are considered as a group, a 
mean average difference of 0.06 is obtained. The t value of 0.96 is not 
large enough to demonstrate a significant difference in academic achieve- 
ment between these two large groups. 


TABLE Il 


College Achievement of Combination of Non-Transfer and Transfer 
Students Having Had Vocational Agriculture in High School 
and Those Not Having Had Vocational Agriculture 





College Grade Point Averages 





Former High School 











Curriculum Number =X xX =X? t P05 
Vocational Agriculture 75 179.399 2.39 439.617 

0.96 2.00 
No Vocational Agriculture 75 =: 184.138 2.45 465.706 











Conclusions and Implications 


This means of comparing the academic achievement of two groups has 
certain recognized weaknesses, the most obvious being that it assumes 
both groups have taken course patterns of equal difficulty. Yet the fact 
that none of the categories showed significant differences cannot be ignored. 
If there are differences in college academic achievement between students 
at the College of Agriculture at Davis who had three or more years of 
vocational agriculture in high school and agricultural college students who 
have not had this course, these differences are not shown by grade point 
averages. 

This conclusion should not be construed to mean that the vocational 
agriculture program in the high school can rest upon past achievements. 
Agricultural teachers and school administrators must continue to improve 
the instruction in these programs. They should continually encourage agri- 
cultural students to enroll in a liberal offering of the basic subjects in the 
school. 

It should be emphasized that such vocations as veterinarian, agricul- 
tural teacher, agricultural extension worker, and employees in many agri- 
cultural business concerns often require farm experience and a college degree. 
Vocational agriculture in high school provides an excellent opportunity for 
students to receive farm experience. In addition, many vocational agricul- 
ture students find that this subject gives meaning to subjects such as mathe- 
matics, physics, and other sciences, and may have value in their study. 


(Continued on page 185) 
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Differences in Interest Patterns According 
To High School Major Sequences 


Jonn ALLAN Siti and Puitie G. Nasu 


In recent years there has been an increasing use of interest inventory 
scores as counseling devices in assisting the student to make a choice of a 
high school major sequence. In using interest inventory scores for this 
type of guidance, two assumptions are made: (1) that the interests of 
students in the ninth or tenth grade are relatively fixed, and (2) that the 
students taking the various high school major sequences represent homo- 
geneous groupings to the extent that they may be expected to show com- 
mon patterns of interests. This study was designed to test the second 
assumption: Do senior high school students show group differences in inter- 
est patterns according to their major sequences? 


Procedure 


A total of 1015 seniors from Los Angeles City high schools were admin- 
istered the Occupational Interest Inventory.1 The scores were grouped for 
both boys and girls according to the major sequences of the students. In 
some instances the number of students within a major sequence grouping 
was so small that groups were combined where such combinations were 
appropriate. For example, girls taking Mathematics or Mathematics-Science 
major sequences were combined into one group, and all the boys taking 
the various business education majors, Bookkeeping, Clerical, Salesmanship, 
and Stenographic, were combined into one group. For both boys and girls, 
students with major sequences in Art and Music were combined under the 
heading “Fine Arts.” 

The Occupational Interest Inventory measures six fields of interest, 
three types of interest, and a “level of interests.” This study is confined 
to an analysis of the relationship between the six fields of interest and 


‘Edwin A. Lee and Louis P. Thorpe, Occupational Interest Inventory, Ad- 
vanced Series, Lo Los s Angeles, Cal California Test Bureau, 1943. 


Philip G. ip G. Nash, Specialist, E Specialist, Evaluation and Research Section, Los Angeles City 
School Districts, has been employed by this school district for twelve years. He 
was a junior high school teacher for seven years prior to assuming his present 
position five years ago. He has engaged in advanced work at the University of 
Southern California. 

John Allan Smith is currently the Assistant Superintendent of Educational 
Services in the Paramount Unified School District. Dr. Smith has been in Para- 
mount one year, but prior to 1957 he had served in the Los Angeles City Schools 
for 29 years. He received an Ed.D. from U.S.C. in 1948. 
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the major sequences. The six fields of interest included in the Occupational 
Interest Inventory are: Personal-Social, Natural, Mechanical, Business, Fine 
Arts, and The Sciences. The subject obtains a score in each of these fields 
by making a two-way forced choice on each of the 120 items. 

The scores obtained in each of the interest fields were tested by analysis 
of variance to determine if there was a significant difference between 
scores among the students according to their major sequences. 

One of the conditions in testing significance by analysis of variance 
is the necessity that observations within groups be mutually independent.” 
To fulfill this requirement separate F tests were made for each of the six 
interest fields. In a forced-choice interest inventory, the scores for each stu- 
dent in all sections of the inventory are inter-dependent. High scores in 
two or three fields of the inventory result in low scores in two or three 
other fields of the inventory. However, the score of a subject in any single 
interest field fulfills the requirement of independence, since his score is 
limited only by the number of items related to that particular field of 
interest (in this case, 40). 

Table I presents the mean scores of 473 male 12th grade students in 
the six interest fields according to their major sequences. The table also 


TABLE 1 


Mean Scores of Male 12th Grade Students in Fields of Occupational 
Interest Inventory According to High School Major Sequences 

















N= 473 
Fields of Interest : 

Major Personal- Mechni-  Busi- The The 
Sequences N Social Natural ical ness Arts Sciences 
Mathematics 48 15.31 18.65 22.83 21.65 17.68 22.96 
Math.-Science o4 16.21 18.33 22.19 17.44 16.81 27.35 
Science 64 18.08 20.63 19.64 19.16 17.63 24.17 
Foreign Languages 43 17.56 18.53 19.00 2121 19.28 22.56 
Social Studies 18 18.72 18.61 21.56 21.56 20.89 17.72 
General 31 16.48 18.39 20.45 20.10 19.90 21.32 
Business Education 20 17.35 18.25 17.30 27.80 19.70 19.95 
Fine Arts 15 15.53 20.73 20.27 16.67 27.07 18.47 
Industrial Arts 96 14.25 21.06 23.84 19.16 17.83 21.58 
Vocational Arts 81 13.21 23.03 24.86 17.63 19.04 21.26 

Total 473 15.72 20.09 22.01 19.57 18.67 22.47 

F ratio 5.76 3.83 7.59 S.t1 5.15 7.36 

_ Significance 01 01 01 > Qi 01 01 





2J. P. Guilford, Fundamental Statistics in eaeeie and Education, New 
York, McGraw Hill Book Company, Inc., 1956, p. 282 
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lists the mean scores of all male students in the various interest fields, the 
F ratio obtained for each interest field, and the significance of the F ratio. 
Table II presents similar data for 542 12th grade female students. Sepa- 
rate F tests were obtained for boys and girls, since interest inventories dif- 
ferentiate to a large degree between the sexes. 


TABLE Il 


Mean Scores of Female 12th Grade Students in Fields of Occupational 
Interest Inventory According to High School Major Sequences 














N= 542 
Fields of Interest 
Major Personal- Mechni- _ Busi- The The 
Sequences N Social Natural ical ness Arts Sciences 
Mathematics & 

Math.-Science 15 23.40 13.35 17.40 19.67 25.73 20.93 
Science 55 26.55 13.53 13.55 21.75 25.65 18.82 
Foreign Languages 84 24.57 15.15 14.45 21.62 27.24 16.24 
Social Studies 16 23.44 13.44 15.81 22.75 26.63 17.94 
General 23 24.87 12.00 13.22 23.30 24.26 17.65 
Bookkeeping 45 22.78 11.87 15.60 27.87 25.33 15.78 
Clerical 83 23.46 12.58 14.94 26,39 25.05 17.41 
Salesmanship 26 22.12 17.46 16.15 21.88 24.23 17.38 
Stenographic 112 23.51 12.04 15.04 26.64 26.30 15.42 
Fine Arts 28 22.46 13.50 15.75 20.43 29.93 15.68 
Homemaking 35 24.05 14.71 15.67 23.57 24.51 17.98 

Total 542 23.72 13.42 14.99 24.11 25.92 16.92 
F ratio 2.37 2.19 5.84 7.72 2.61 2.95 
Significance 01 05 01 01 01 01 











Further tests of significance were made to determine in what fields of 
interest students with the various major sequences had mean scores higher 
or lower than the means of all the students in the study. Tables III and IV 
present a summary of these data. 


Findings 


The F ratios obtained between the major sequence groupings for each 
of the six interest fields were significant at the .01 level for 11 of the 12 
sets of scores tested. The exception was in the case of the scores made by 
girls in the Natural field of interest. In this case the F ratio obtained was 
at the .05 level of significance. 

It is apparent that all fields of the Occupational Interest Inventory 
differentiate between both male and female students according to the major 
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sequences of the students, or stated differently, students taking the various 
major sequences may not be considered to be random samples of the total 
high school population with respect to the patterns of their various interests. 


TABLE Ill 


Significantly High and Low Mean Scores of Male 12th Grade Students 
in Fields of Occupational Interest Inventory According to High School 











Major Sequences } 
Mean by Mean of All Differ- Signifi- 5 
High School Field of Major Male ence of cance of 
Major Sequence Interest Sequence Students Means ___ Difference i 
Mathematics (No significantly high or low mean scores) | 
Math.-Science The Sciences 27.35 22.47 +4.88 01 } 
Business 17.44 19.57 —2.13 01 
The Arts 16.81 18.67 —1.86 05 ) 
Natural 18.33 20.09 —1.76 05 
Science Personal-Social 18.08 15.72 +2.36 01 
The Sciences 24.17 22.47 +-1.70 05 
Mechanical 19.64 22.01 —2.37 01 
Foreign Languages Personal-Social 17.56 15.72 +1.84 01 f 
Mechanical 19.00 22.01 —2.01 01 ' 
; 
Social Studies Personal-Social 18.72 15.72 +1.84 01 
The Sciences 17.72 22.47 —4.75 01 
General (No significantly high or low mean scores) } 
Business Education Business 27.80 19.57 +8.23 01 
Mechanical 17.30 22.01 —4.71 01 
The Sciences 19.95 22.47 —2.52 05 \ 
Fine Arts The Arts 27.07 18.67 +8.40 01 
The Sciences 18.47 22.47 —4.00 05 
Industrial Arts Mechanical 23.84 22.01 +1.83 01 
Personal-Social 14.25 15.72 —1].47 05 
Vocational Arts Natural 23.03 20.09 +2.94 01 
Mechanical 24.87 22.01 +2.86 01 f 
Personal-Social 13.71 15.72 —2.51 01 t 
Business 17.63 19.57 —1.94 05 ‘ 


It may be noted that the F ratios obtained for boys exceeded those 
obtained for girls in every interest field except Business, in which case the 
F ratio obtained for girls exceeded that obtained for boys. It would appear } 
that boys taking the various high school major sequences represent some- \ 
what more homogeneous groupings with respect to interests than do girls. 
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Examination of the data in Tables III and IV reveals that in many 
instances the mean scores of students with the various major sequences 
were appropriately high in related fields of interest. For example, in the 
Sciences field of interest, male students with Mathmatics-Science major 
sequences had a mean score significantly above the mean of all male stu- 
dents, and female students with either Science or Mathematics-Science 
majors had mean scores significantly above the mean of all female students. 


TABLE IV 


Significantly High and Low Mean Scores of Female 12th Grade 
Students in Fields of Occupational Interest Inventory According to 
High School Major Sequences 





Mean by Meanof All Differ- Signifi- 

High School Field of Major Female ence of cance of 

Major Sequence Interest Sequence Students Means Difference 
Mathematics & The Sciences 20.93 16.92 +4.01 01 
Math.-Science Mechanical 17.40 14.99 +2.41 01 
Business 19.67 24.11 —4,44 01 
Science Personal-Social 26.55 23.72 +2.83 01 
The Sciences 18.82 16.92 +1.90 05 
Business 21.75 24.11 —2.36 01 
Mechanical 13.55 14.99 —1].44 01 
Foreign Languages Business 21.62 24.11 —2.49 01 

Social Studies (No significantly high or low mean scores) 
General (No significantly high or low mean scores) 

Bookkeeping Business 27.87 24.11 +3.76 01 
Clerical Business 26.39 24.11 +2.28 01 
Salesmanship Natural 17.46 13.42 +3.04 05 
Stenographic Business 26.64 24.11 +2.53 01 
The Sciences 15.42 16.92 —1.50 01 
Fine Arts The Arts 29.93 25.92 +3.01 01 
Business 20.43 24.11 —3.68 01 


Homemaking (No significantly high or low mean scores) 


Both male and female students with Business Education majors had signifi- 
cantly high scores in the Business field of interest. Both male and female 
students with Fine Arts majors have significantly high scores in Fine Arts 
interest field. Boys with major sequences in Industrial Arts and Vocational 
Arts had significantly high scores in the Mechanical field of interest. 
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In those instances where the major sequences did not have any apparent 
direct relationship to any of the six fields of interest, the students did not, 
on the average, show any significantly high or low interests. Boys with 
General or Mathematics majors, and girls with Social Studies, General or 
Homemaking majors had no mean scores significantly above or below the 
average of all the students. Girl students with Foreign Language major 
sequences had no significantly high mean scores in any of the interest fields, 
although boys with this major had a significantly high score in the Personal- 
Social field of interest. 

Examination of the below-average mean scores indicated that, in gen- 
eral, girls with any of the major sequences except Business Education 
tended to have low scores in the Business field of interest. 

There was a certain illogic in the pattern of interest exhibited by stu- 
dents with Science majors. While it is true that both boys and girls had 
significantly high scores in the Sciences field of interest they also both had 
above-average scores in the Personal-Social field of interest. Strangely, 
both boys and girls with this major sequence showed a significantly low 
interest in the Mechanical field of interest. Perhaps this indicates an 
interest in people and ideas and a lack of interest in things. 

The most definitive and logical interest patterns were exhibited by boys 
with Industrial Arts or Vocational Arts major sequences. Students with 
Industrial Arts major sequences had a significantly high interest in the 
Mechanical field of interest and a low interest in the Personal-Social interest 
field. Students with Vocational Arts major sequences had above-average 
interests in the Natural and Mechanical fields, and below-average interests 
in the Personal-Social and Business fields of interest. 

The sizes of the groups were so small in some instances that significant 
differences were difficult to obtain. It is possible that further research 
with larger groups of students may indicate definitive interest patterns for 
some of the major sequences that were not revealed in this study. 


Conclusions 


1. Generally, it may be concluded that students taking the various 
major sequences in high school represent somewhat homogeneous group- 
ings with respect to their interest patterns. 

2. In general, boys with the various major sequences are more sharply 
differentiated in their interest patterns than girls. It would appear that 
the high school curriculum gives boys a better opportunity to pursue their 
interests than it does girls. 

3. Students tend to have high interests in those fields of interests di- 
rectly related to their major sequences. Where there is no direct relation- 
ship between the major sequences and any of the interest fields, there 
apparently is no interest pattern that can be identified. 
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4. Students in at least one major sequence, Science, exhibit interest 
patterns that are not entirely related to the course offerings in this area 
of study. 


Recommendations 


This study had the somewhat limited objective of finding the relation- 
ships between the interest patterns and major sequences of high school 
students who had already taken the various majors for a period of two or 
three years. No analysis was possible in this study of a cause and effect 
relationship: whether these students chose their major sequences because 
of their interests, or developed their interests because of taking the majors 
that they did. 

Additional research is needed to answer at least two questions con- 
cerning the relationship between scores on an interest inventory and major 
sequences: 

(1) What prediction of success can be made if a student chooses a 
major sequence primarily on the basis of interest inventory scores? 

(2) How does additional study in a major sequence affect or change 
an interest inventory score in a related field? 

It is suggested that the answers to these questions may be ascertained 
by the administering of an interest inventory to A9 or B10 students while 
they are in the process of choosing their high school major sequences, and 
again administering the inventory when the students are Al2’s. A follow-up 
study of this nature may indicate both the predictive value of the inven- 
tory, and the effect of additional study on the level of interests. 


Vocational Agriculture . . . 
(Continued from page 178) 


This study shows that students who had a full program of vocational 
agriculture in high school did as well in agriculture at the College of Agri- 
culture at Davis as did students who had not been enrolled in this curricu- 
lum. Counselors and others advising students should be ever mindful of 
the vocational interest of the student. It therefore seems doubtful that a 
college-directed student should be counseled out of agriculture in high 
school if he is really interested in agricultural work as a career. 
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Educational Differences Between Fall and 
Spring Classes in a School System 


With Semi-Annual Promotions 
Howarp A. BowMAN 


Most teachers in school systems employing semi-annual promotion, par- 
ticularly at the junior and senior high school levels, but also in the upper 
elementary grades, have observed that there appear to be fundamental 
educational differences between the pupils of classes of two successive grade 
levels, e.g., between A-5 and B-6, B-9 and A-9, A-11 and B-12, and so forth. 

Theoretically, the measured achievement averages of two successive 
classes in such a system should be one semester (five school months) apart. 
Also, the measured achievement averages of two successive “A” or “B” sec- 
tions should be two semesters, or one school year (ten school months) 
apart. While it is usually true that two successive “A” or “B” sections, for 
example A-5 and A-6 or B-5 and B-6, are approximately one school year 
apart in measured achievement, it is not usually true that successive grade 
sections are five months apart in measured achievement. 

To illustrate the foregoing statement we may cite an example from the 
results of the fall, 1955 Evaluation Program in the Los Angeles City Ele- 
mentary Schools. The average grade placement in reading comprehension 
for three successive grade sections was as follows: 


B-5 = 5.3 
A-5 5.7 
B-6 =6.3 


It will be observed that the difference between the B-5 and A-5 achieve- 
ment was but four school months, whereas that between the A-5 and B-6 
grades was six school months. Although the B-5 and B-6 classes were 
exactly one school year apart in achievement, the A-5 class was not midway 
between them, but was somewhat less than one-half year advanced over 
the B-5 status. 

If a similar table were constructed for a series of three grades at the 
upper junior high school level, it would be likely to exhibit successive dif- 
ferences in achievement of two and eight months. At the upper senior high 
school level the situation reaches its maximum, for here the lowest of the 


Howard A. Bowman is a member of the Advisory Council on Educational 
Research. Dr. Bowman received his doctorate from the University of Southern 
California, and has been with the Los Angeles City School Districts since 1935— 
as a teacher, school counselor, and Supervisor of Measurement and Evaluation, 
before assuming his present position of Director of Evaluation and Research. 
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three grade levels may actually achieve at a higher level than does the 
next higher grade. This phenomenon has been called “Looping” because a 
a class of a given grade level has achieved higher than or “looped over” 
the next higher grade. 

Figure 1, which illustrates this phenomenon at the senior high school 
level, is a graph of the results of the spring, 1956 senior high school Evalua- 
tion Program, which made use of the Iowa Tests of Educational Develop- 
ment. 


FIGURE 1 


Average Achievement Levels of Senior High School Grades, 
Spring 1956 


TEST TEST TEST TEST TEST TEST TEST TEST COMP. TEST 
| 2 3 4 5 


6 7 8 1-8 9 













af 
@ |{@ 


Ye 
D|/® | 


| 
ae A 
ia 


i 
i 
a 
Ait | 


N 
Viet | 
SNL | 


# 
m 


i al) 7 
wt ARIAS 
wT NITY SS 
ing. V7 Asie 
Pi Le ee 
ie 





Percentile rank for 
composite score 


Note that grades A-10 and B-11 show quite similar levels of achievement 
except on Test 1 and Test 8. The A-10 class achievement is approaching 
that of the B-11 class, and is about to “loop” over it. 

A comparison of the achievement of the A-11 class with that of the B-12 
class reveals that “looping” has already occurred on Tests 3, 6, and 7 (as 
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shown by the broken lines). On all other tests, except possibly Test 1, 
achievement differences between the A-11 and B-12 classes are almost non- 
existent. 

The educational implications of the situation seem rather obvious. Cer- 
tainly they are felt to a greater degree at the higher grade levels than at 
the lower. However, even the teacher who has had A-6 classes for two 
consecutive semesters probably will have felt puzzled as to why one class 
seemed to learn faster than the other. Certainly the high school teacher 
who teaches classes in an A-11 subject term after term will have been not 
only puzzled, but downright confused as to what instructional alterations 
might be made from term to term in order to make the most of the abilities 
of the pupils in these successive classes. 

The confusion is further compounded by the fact that the class which 
performs at norm is virtually always the “large” group; that is, it is the 
class which commences each full grade in September, and which will ulti- 
mately graduate from high school in June. Conversely, the class which falls 
further and further behind until the next following class “loops” beyond it 
in achievement is practically always the “small” class, which originally en- 
tered school in February and will graduate from high school in January. 

Perhaps one important step may be taken in the direction of reducing 
the confusion if we find out how and why “looping” takes place. The 
remainder of this presentation will attempt to do this. For simplification, 
we shall designate the large, September-entering, June-graduating class as 
the “A” class, and the smaller, February-entering, January-graduating class 
as the “B” class. 


Hypothesis and Assumptions 


In order to see how and why “looping” occurs, we must establish an 
hypothesis, and make assumptions, and confirm the hypothesis through veri- 
fication of the truth of the assumptions. 

The hypothesis is that this phenomenon of “looping” is a mathematical 
product of the semi-annual promotion system plus certain other character- 
istics of the operation of the informal promotion policy of the Los Angeles 
City School Districts. The assumptions to be made are as follows: 


1. Upon entrance to school in grade 1, the “A” and “B” section pupils 
are essentially unselected as to intelligence. That is, there is no 
basic difference between the learning capacities of these two groups 
of pupils. 

2. About 7/12 of the pupils who enter grade 1 in a given year do so 
in September, while the remaining 5/12 enter grade 1 in February. 


3. The “failure” rate (i.e., proportion of pupils not promoted) is essen- 
tially the same for successive “A” and “B” sections. 
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4, Pupils usually do not fail of promotion more than once. 


5. The “acceleration” rate for the “A” sections is not equal to that 
for the “B” sections. 


6. Transfers into the school districts do not vitiate conclusions drawn 
from assumption No. 2. 


Verification of Assumptions 


Assumption No. 1. During the 1954-55 school year the intelligence 
quotients of entering first grade pupils were the subject of an extensive 
study. Such differences as existed, while slight, actually favored the 
February-entering (i.e., “B” group) pupils. The difference between the 
average IQ’s of the September-entering and the February-entering pupils 
was 1.75 IQ points. This would appear to indicate that the February- 
entering group may actually have a slight initial advantage. 

At the third grade level, the average IQ’s of the two groups were 
exactly the same, and this was also true for grades 4 and 5. At grade 6 
the February-entering group fell one point below the September-entering 
group. 

For the puposes of this paper it is only necessary to show that the 
September-entering group is not, for some reason, initially more capable 
than the February-entering group. It is believed that the foregoing data 
demonstrate this point, and assumption No. 1 is regarded as verified. 


Assumption No. 2. The school year is but ten months long whereas 
pupils are born at monthly rates spread reasonably equally over the twelve 
month calendar year. The California Education Code establishes provisions 
for entrance to grade 1 which are such that pupils entering in September 
are drawn from seven calendar months, while those entering in February 
are drawn from five calendar months. 

If 7/12 of a year’s group entered the first grade in September and 
5/12 in February, these fractions could be expressed, respectively, as 58.3 
per cent and 41.7 per cent of the year’s total. The study of first grade 
IQ’s mentioned earlier revealed that the actual proportions entering in Sep- 
tember and February respectively, were 59.2 per cent and 40.8 per cent. 
The difference is less than one per cent in either case, and the second 
assumption may be regarded as verified. 


Assumption No. 3. The third assumption has to do with the constancy 
of the “failure” rate, or rate of non-promotion. While there are no data on 
this subject more recent than 1952, when the last study was made, neither 
is there reason to believe that there has been any material change in the 
situation. 
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The 1952 study showed that the failure rate (in elementary school) for 
“A” classes varied from 3.0 per cent to 4.6 per cent, whereas that for “B” 
groups varied from 3.5 per cent to 5.8 per cent, the variations depending 
on grade level. At no point was the difference in failure rate between suc- 
cessive grade levels greater than 1.6 per cent, and it was, in one instance, 
as low as 0.2 per cent. 

Despite the fact that the per-semester failure rate was not exactly the 
same for any two successive groups, it was sufficiently so that, when applied 
to the total enrollment of these successive groups, the actual number of 
pupils retained in a grade for an additional semester was always greater 
with respect to the “A” group than with respect to the “B” group. 

The average rate of retention per grade over the twelve semesters of 
elementary school was found to be four per cent. The foregoing para- 
graphs verify the third assumption, but if the actual rates per grade were 
used in the calculations to follow, the arithmetic would be more compli- 
cated than necessary. The rate of four per cent per semester will be taken 
as sufficiently typical to allow its use in the simplified calculations. 


Assumption No. 4. The fourth assumption is that most children who 
are retained are retained but once. Certainly some few children may be 
retained two or even three times. However, there are certain safeguards 
with reference, for example, to the maximum age which may be attained 
by an elementary school pupil before he must be promoted to junior high 
school. That these safeguards are operating effectively is borne out by the 
fact that the average age of the pupils beginning each successive full (i.e., 
two semesters) grade level is almost exactly one year greater than the av- 
erage age of pupils of the grade level one school year below. 


Assumption No. 5. The fifth assumption is that the acceleration (i.e., 
double promotion) rates are not equal for the “A” and “B” sections. If, at 
each grade level, the proportions of pupils accelerated were equal to the 
proportions retained, the numerical effects of these processes would cancel 
each other. 

One way of determining the amount of acceleration is to examine the 
numbers and proportions of bright pupils at each grade level who are at or 
older than the mean age for that grade level. Where there has been much 
acceleration the proportions of such pupils will be low. Conversely, higher 
proportions of such pupils will be shown for grade levels where there has 
been less acceleration. 

Data which bear upon this point were obtained in 1955. “A” sections 
contained between three and four times as high- a proportion of bright 
“old” pupils as did the “B” sections. In more than 90 per cent of the cases, 
bright pupils in “B” sections had been accelerated, whereas in the “A” 
sections only about 60 to 70 per cent of these pupils had been accelerated. 
The acceleration rates are thus seen to be quite different for the two groups. 
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Assumption No. 6. The sixth, and last, assumption is that transfers into 
the school districts do not vitiate conclusions drawn from assumption No, 2. 
Assumption No. 2 had as its purpose that of showing the numerical inequal- 
ity between “A” classes and “B” classes. So long as transfers into the dis- 
trict do not reduce this numerical inequality, the argument to follow will 
hold. 

Most school systems in the United States are wholly or partly on an 
annual promotion basis. That is, such systems do not have “A” and “B” 
clases. Rather, the annual promotion grouping corresponds to the “A” 
classes in Los Angeles. Pupils transferring into Los Angeles schools from 
annual promotion systems will almost invariably enter the “A” section. 

That the foregoing is true is verified by a glance at the enrollment 
figures for the several grades as of November, 1955. In the sixth grade 
about 68 per cent of the pupils were enrolled in the “A” section. In the 
eighth grade the corresponding proportion was 72 per cent, and in the 
twelfth grade 73 per cent. It is thus apparent that transfers into the dis- 
tricts, plus other causes, result in an increase in the numerical discrepancy 
between the “A” and “B” sections. 


The Arithmetic of "Looping" 


The relative sizes of the groups of pupils retained each semester can be 
obtained by multiplying the proportion retained (four per cent) by the 
number of pupils in the grade. Alternatively, one may compute four per 
cent of the proportions that the “A” and “B” sections, respectively, bear to 
the total for the grade. These proportions were earlier shown to be 59.2 
per cent and 40.8 per cent, Extracting four per cent of each of these yields, 
neglecting decimals, 2368 and 1632, or proportions of about three to two. 

Another way of stating the foregoing is that from each “A” section, 
about half again as many pupils are retained as from each “B” section. 
These pupils are lost to the grade section next below, but in return, pupils 
failed from the grade section next higher are gained. Thus a “B” section 
loses pupils to the “A” section next below and gains pupils from the “A” 
section next above. For every two pupils it loses, it gains three. Con- 
versely, an “A” section gains two for every three it loses. 

It was shown earlier that retention is generally a “one-shot” affair; 
that is, relatively few pupils are retained more than once. This being the 
case the process outlined in the foregoing paragraph becomes cumulative, 
the “B” section gaining each semester more such pupils than it loses. 

Reference should also be made here to another possibility, namely that 
if as many pupils were accelerated as were retained, changes in class 
characteristics due to either procedure would be largely if not entirely 
nullified. However, many more pupils are retained than are accelerated. 
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Moreover, such accelerations as do take place are largely out of the “B” sec- 
tions and into the “A” sections. Retention is clearly the dominant factor. 

Pupils retained do not seem to be markedly inferior in IQ. The average 
1Q’s of “B” sections ultimately drop somewhat below those of “A” sections, 
but not excessively so. The pupils failed are thus not necessarily incapable 
of learning; they just haven’t performed well enough to satisfy their teachers. 
The total effect of the retentions is that over a period of several years the 
“B” sections become “loaded” with less successful pupils, while the “A” 
sections, over a corresponding period of time, markedly reduce their num- 
bers of such pupils. 

It should be remembered, of course, that not all pupils in the “B” sec- 
tions are low achievers, nor are all pupils in the “A” sections high achievers. 
Here we are concerned with averages. These averages mean that the bulk 
of the pupils in “B” sections, particularly at higher grade levels, are achiev- 
ing at a lower level than they should, considering the norm for that par- 
ticular grade. 


Implications 


On many occasions the writer has presented the foregoing information 
to groups of teachers, counselors, and principals. One question which is 
nearly always asked is, “Is this an argument in favor of annual promotion?” 

The response to this question is in the negative. Our sole purpose is to 
explain why test results, in junior and senior high school particularly, 
exhibit the startling discrepancies exemplified in Figure 1. It must be 
remembered that the test results represent the combined achievement of 
many pupils. The grades in which they sit are the artifacts of school 
administration (as is the policy under which they are promoted) and there 
is no inherent relationship between pupil and grade level. Changing from 
semi-annual to annual promotion would relieve the school district and its 
officers of the onerous task of explaining to the public why every-other class 
does not perform nearly so well as the classes preceding and following. 
Such a change, however, probably would have no effect upon the perform- 
ances of individual pupils except to the degree that more appropriate grade 
placement might be attained, with reference to the difficulty level of the 
work individuals were expected to do. 








